An on-line algorithm for dynamic reinforcement learning and planning in reactive environments

نویسنده

  • Jürgen Schmidhuber
چکیده

An on line learning algorithm for reinforcement learning with continually running recur rent networks in non stationary reactive environments is described Various kinds of rein forcement are considered as special types of input to an agent living in the environment The agent s only goal is to maximize the amount of reinforcement received over time Supervised learning techniques for recurrent networks serve to construct a di erentiable model of the environmental dynamics which includes a model of future reinforcement This model is used for learning goal directed behavior in an on line fashion The method extends work done by Munro Robinson and Fallside Werbos Widrow and Jordan The possibility of using the system for planning future action sequences is investigated and this approach is compared to approaches based on temporal di erence methods A connection to meta learning learning how to learn is noted

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

User-based Vehicle Route Guidance in Urban Networks Based on Intelligent Multi Agents Systems and the ANT-Q Algorithm

Guiding vehicles to their destination under dynamic traffic conditions is an important topic in the field of Intelligent Transportation Systems (ITS). Nowadays, many complex systems can be controlled by using multi agent systems. Adaptation with the current condition is an important feature of the agents. In this research, formulation of dynamic guidance for vehicles has been investigated based...

متن کامل

Learning to Plan Probabilistically

This paper discusses the learning of probabilistic planning without a priori domain-specific knowledge. Different from existing reinforcement learning algorithms that generate only reactive policies and existing probabilistic planning algorithms that requires a substantial amount of a priori knowledge in order to plan, we devise a two-stage bottom-up learning-to-plan process, in which first rei...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Chaotic Genetic Algorithm based on Explicit Memory with a new Strategy for Updating and Retrieval of Memory in Dynamic Environments

Many of the problems considered in optimization and learning assume that solutions exist in a dynamic. Hence, algorithms are required that dynamically adapt with the problem’s conditions and search new conditions. Mostly, utilization of information from the past allows to quickly adapting changes after. This is the idea underlining the use of memory in this field, what involves key design issue...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990